BUG: Internet Explorer's Random Deletions |
Normally, an Internet cache stores files on your computer so when you revisit a web page that hasn't changed (or has only partially changed), it doesn't have to get everything off the web page again. Instead, the browser displays what is stored in the cache on your computer's hard drive. All caches have a set maximum determined by you, the user. When the cache fills up (that is, reaches the maximum size you set) it is supposed to delete the files from the cache that are from the web page you visited the longest ago, with some exceptions.
Internet Explorer 3 and 4's cache manager, however, does not do this. It picks files seemingly at random to delete. (Further research has shown that it isn't at random - read the "Details" section below if you want the technical details.) Files that came from web pages you may have visited only minutes ago may be chosen for deletion. This causes a major performance degradation when browsing - especially when you try to use the "Back up" button in the browser. Web page content that has been incorrectly deleted from the cache must be retransferred over the Internet - a very slow process for modem users. This can also cause much frustration when subscribing to content and trying to browse it offline, as you may repeatedly get prompted to log back on because the subscribed content has also been deleted. CacheSentry fixes this problem by taking over the job of removing the oldest files from the cache itself before Internet Explorer can remove them itself.
Details of the Internet Explorer 3 and 4 bug
It turns out the deletions IE does are not random at all. In order to understand the issue, I must explain a bit about the internals of the IE cache. The cache is similar to a regular folder, programming-wise. Its contents can be listed, and like all directories, the order in which the filenames come back to the program requesting the listing have a default order, or in other words, the order in which the files are physically stored in the cache. If you were to list these files and assign them a number, the first filename returned would be #1 and the last would be #n. Someone at Microsoft assumed the lowest-numbered files were always the oldest, and this is not the case - Internet Explorer can place new files anywhere in the cache, in any order. So the result of always deleting the lowest-numbered files in the cache doesn't say anything about how old a file is. It could be months old or mere seconds old! This is the nature of the Internet 3 and 4 version of this bug.
Details of Internet Explorer 5
In version 5, they finally fixed this bug and it doesn't simply delete files by lowest-order in the cache anymore. IE5 uses the following criteria to remove files:
The last factor, in my opinion, leads to deletions in the IE5 cache that are almost as bad as the deletions IE 3 and 4 were doing. This factor was taken into account because files from web servers without modification times are supposed to indicate "dynamic content" that is supposed to be refereshed every time the page is visitied. Unfortunately, this doesn't work very well in practice because several web servers are either not set up correctly, or simply don't provide a modified time for files retrieved by the browser. This leads to some recent files being deleted in the cache before you expect, and can lead to headaches when off-line browsing.
IE5 still makes some very strange deletion decisions that none of the above factors can explain, sometimes deleting content that is from a week old to only a few hours old all at once, even when a large cache is used.